Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 8679298 |
| Missing cells | 22238246 |
| Missing cells (%) | 17.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 993.3 MiB |
| Average record size in memory | 120.0 B |
Variable types
| Numeric | 4 |
|---|---|
| Categorical | 7 |
| Unsupported | 4 |
EntryDate has a high cardinality: 2911 distinct values | High cardinality |
EntryTime has a high cardinality: 27227 distinct values | High cardinality |
Code has a high cardinality: 2025 distinct values | High cardinality |
Analyte has a high cardinality: 3479 distinct values | High cardinality |
ValueNumber has a high cardinality: 23084 distinct values | High cardinality |
ValueText has a high cardinality: 32847 distinct values | High cardinality |
Unit has a high cardinality: 195 distinct values | High cardinality |
Report is highly overall correlated with ID | High correlation |
ID is highly overall correlated with Report | High correlation |
Code has 8106112 (93.4%) missing values | Missing |
NCLP has 573524 (6.6%) missing values | Missing |
ValueNumber has 1338881 (15.4%) missing values | Missing |
ValueText has 7340434 (84.6%) missing values | Missing |
RefHigh has 1747017 (20.1%) missing values | Missing |
RefLow has 1747253 (20.1%) missing values | Missing |
Unit has 1383102 (15.9%) missing values | Missing |
RefHigh is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
RefLow is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
NLCP_E is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
NCLP_E is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
| Analysis started | 2022-11-26 18:44:53.784016 |
|---|---|
| Analysis finished | 2022-11-26 18:48:03.954673 |
| Duration | 3 minutes and 10.17 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
Patient
Real number (ℝ)
| Distinct | 14158 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 455221.28 |
| Minimum | 85 |
|---|---|
| Maximum | 1472547 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.2 MiB |
Quantile statistics
| Minimum | 85 |
|---|---|
| 5-th percentile | 8140 |
| Q1 | 60424 |
| median | 180869 |
| Q3 | 1181123 |
| 95-th percentile | 1243465 |
| Maximum | 1472547 |
| Range | 1472462 |
| Interquartile range (IQR) | 1120699 |
Descriptive statistics
| Standard deviation | 495840.26 |
|---|---|
| Coefficient of variation (CV) | 1.0892291 |
| Kurtosis | -1.238934 |
| Mean | 455221.28 |
| Median Absolute Deviation (MAD) | 155062 |
| Skewness | 0.75973107 |
| Sum | 3.9510011 × 1012 |
| Variance | 2.4585757 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 210904 | 12656 | 0.1% |
| 1203600 | 12262 | 0.1% |
| 1248228 | 11517 | 0.1% |
| 193098 | 10777 | 0.1% |
| 210732 | 10225 | 0.1% |
| 583928 | 9983 | 0.1% |
| 1205878 | 9664 | 0.1% |
| 212373 | 9280 | 0.1% |
| 1205251 | 9147 | 0.1% |
| 1236850 | 9122 | 0.1% |
| Other values (14148) | 8574665 |
| Value | Count | Frequency (%) |
| 85 | 1683 | |
| 99 | 476 | < 0.1% |
| 104 | 574 | < 0.1% |
| 124 | 1154 | |
| 126 | 594 | < 0.1% |
| 131 | 496 | < 0.1% |
| 153 | 1021 | |
| 224 | 1026 | |
| 254 | 707 | < 0.1% |
| 287 | 2140 |
| Value | Count | Frequency (%) |
| 1472547 | 1050 | |
| 1340726 | 1863 | |
| 1305340 | 2428 | |
| 1298848 | 372 | < 0.1% |
| 1290559 | 137 | < 0.1% |
| 1286279 | 464 | < 0.1% |
| 1284508 | 822 | < 0.1% |
| 1283452 | 503 | < 0.1% |
| 1283279 | 265 | < 0.1% |
| 1272007 | 207 | < 0.1% |
Report
Real number (ℝ)
| Distinct | 629860 |
|---|---|
| Distinct (%) | 7.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18955521 |
| Minimum | 1237515 |
|---|---|
| Maximum | 27823222 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.2 MiB |
Quantile statistics
| Minimum | 1237515 |
|---|---|
| 5-th percentile | 1533431 |
| Q1 | 17679873 |
| median | 20802937 |
| Q3 | 24266728 |
| 95-th percentile | 27118895 |
| Maximum | 27823222 |
| Range | 26585707 |
| Interquartile range (IQR) | 6586855 |
Descriptive statistics
| Standard deviation | 7645952.7 |
|---|---|
| Coefficient of variation (CV) | 0.40336283 |
| Kurtosis | 0.69463298 |
| Mean | 18955521 |
| Median Absolute Deviation (MAD) | 3270230 |
| Skewness | -1.3245439 |
| Sum | 1.6452062 × 1014 |
| Variance | 5.8460592 × 1013 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 26739266 | 195 | < 0.1% |
| 26740660 | 195 | < 0.1% |
| 26739216 | 195 | < 0.1% |
| 26739424 | 195 | < 0.1% |
| 26740099 | 191 | < 0.1% |
| 26739267 | 191 | < 0.1% |
| 26740661 | 191 | < 0.1% |
| 26739217 | 191 | < 0.1% |
| 26737746 | 191 | < 0.1% |
| 26739425 | 191 | < 0.1% |
| Other values (629850) | 8677372 |
| Value | Count | Frequency (%) |
| 1237515 | 2 | < 0.1% |
| 1237550 | 5 | < 0.1% |
| 1237586 | 14 | |
| 1237620 | 4 | < 0.1% |
| 1237625 | 2 | < 0.1% |
| 1237633 | 2 | < 0.1% |
| 1237635 | 9 | |
| 1237652 | 13 | |
| 1237653 | 4 | < 0.1% |
| 1237658 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 27823222 | 3 | < 0.1% |
| 27823214 | 9 | |
| 27823209 | 9 | |
| 27823190 | 1 | < 0.1% |
| 27823188 | 9 | |
| 27823174 | 3 | < 0.1% |
| 27823149 | 1 | < 0.1% |
| 27823105 | 19 | |
| 27823103 | 16 | |
| 27823101 | 1 | < 0.1% |
ID
Real number (ℝ)
| Distinct | 8604099 |
|---|---|
| Distinct (%) | 99.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 69795856 |
| Minimum | 1282 |
|---|---|
| Maximum | 1.9273763 × 108 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.2 MiB |
Quantile statistics
| Minimum | 1282 |
|---|---|
| 5-th percentile | 6458270.8 |
| Q1 | 23188354 |
| median | 52265658 |
| Q3 | 1.1075503 × 108 |
| 95-th percentile | 1.6219078 × 108 |
| Maximum | 1.9273763 × 108 |
| Range | 1.9273635 × 108 |
| Interquartile range (IQR) | 87566671 |
Descriptive statistics
| Standard deviation | 53137093 |
|---|---|
| Coefficient of variation (CV) | 0.7613216 |
| Kurtosis | -0.84762724 |
| Mean | 69795856 |
| Median Absolute Deviation (MAD) | 35994544 |
| Skewness | 0.58835003 |
| Sum | 6.0577903 × 1014 |
| Variance | 2.8235507 × 1015 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 27803180 | 2 | < 0.1% |
| 21865565 | 2 | < 0.1% |
| 20650535 | 2 | < 0.1% |
| 21656037 | 2 | < 0.1% |
| 21656038 | 2 | < 0.1% |
| 21656039 | 2 | < 0.1% |
| 21656040 | 2 | < 0.1% |
| 21656041 | 2 | < 0.1% |
| 20650536 | 2 | < 0.1% |
| 20650537 | 2 | < 0.1% |
| Other values (8604089) | 8679278 |
| Value | Count | Frequency (%) |
| 1282 | 1 | |
| 1491 | 1 | |
| 4796 | 1 | |
| 4797 | 1 | |
| 4798 | 1 | |
| 4799 | 1 | |
| 4800 | 1 | |
| 4801 | 1 | |
| 4802 | 1 | |
| 4803 | 1 |
| Value | Count | Frequency (%) |
| 192737629 | 1 | |
| 192737628 | 1 | |
| 192737474 | 1 | |
| 192737473 | 1 | |
| 192737472 | 1 | |
| 192737424 | 1 | |
| 192737423 | 1 | |
| 192737422 | 1 | |
| 192737421 | 1 | |
| 192737420 | 1 |
EntryDate
Categorical
| Distinct | 2911 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 66.2 MiB |
| 03.10.2022 | 9468 |
|---|---|
| 12.09.2022 | 8834 |
| 07.09.2022 | 8647 |
| 21.11.2022 | 8515 |
| 05.09.2022 | 8333 |
| Other values (2906) |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 86792980 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 30.09.2022 |
|---|---|
| 2nd row | 30.09.2022 |
| 3rd row | 16.04.2015 |
| 4th row | 16.04.2015 |
| 5th row | 16.04.2015 |
Common Values
| Value | Count | Frequency (%) |
| 03.10.2022 | 9468 | 0.1% |
| 12.09.2022 | 8834 | 0.1% |
| 07.09.2022 | 8647 | 0.1% |
| 21.11.2022 | 8515 | 0.1% |
| 05.09.2022 | 8333 | 0.1% |
| 04.10.2022 | 7943 | 0.1% |
| 03.05.2022 | 7581 | 0.1% |
| 10.10.2022 | 7565 | 0.1% |
| 02.05.2022 | 7514 | 0.1% |
| 10.01.2022 | 7488 | 0.1% |
| Other values (2901) | 8597410 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 03.10.2022 | 9468 | 0.1% |
| 12.09.2022 | 8834 | 0.1% |
| 07.09.2022 | 8647 | 0.1% |
| 21.11.2022 | 8515 | 0.1% |
| 05.09.2022 | 8333 | 0.1% |
| 04.10.2022 | 7943 | 0.1% |
| 03.05.2022 | 7581 | 0.1% |
| 10.10.2022 | 7565 | 0.1% |
| 02.05.2022 | 7514 | 0.1% |
| 10.01.2022 | 7488 | 0.1% |
| Other values (2901) | 8597410 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 20536993 | |
| 2 | 18261521 | |
| . | 17358596 | |
| 1 | 14100046 | |
| 5 | 2774007 | 3.2% |
| 6 | 2722345 | 3.1% |
| 9 | 2706267 | 3.1% |
| 7 | 2398301 | 2.8% |
| 8 | 2326732 | 2.7% |
| 3 | 2018337 | 2.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 69434384 | |
| Other Punctuation | 17358596 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 20536993 | |
| 2 | 18261521 | |
| 1 | 14100046 | |
| 5 | 2774007 | 4.0% |
| 6 | 2722345 | 3.9% |
| 9 | 2706267 | 3.9% |
| 7 | 2398301 | 3.5% |
| 8 | 2326732 | 3.4% |
| 3 | 2018337 | 2.9% |
| 4 | 1589835 | 2.3% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 17358596 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 86792980 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 20536993 | |
| 2 | 18261521 | |
| . | 17358596 | |
| 1 | 14100046 | |
| 5 | 2774007 | 3.2% |
| 6 | 2722345 | 3.1% |
| 9 | 2706267 | 3.1% |
| 7 | 2398301 | 2.8% |
| 8 | 2326732 | 2.7% |
| 3 | 2018337 | 2.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 86792980 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 20536993 | |
| 2 | 18261521 | |
| . | 17358596 | |
| 1 | 14100046 | |
| 5 | 2774007 | 3.2% |
| 6 | 2722345 | 3.1% |
| 9 | 2706267 | 3.1% |
| 7 | 2398301 | 2.8% |
| 8 | 2326732 | 2.7% |
| 3 | 2018337 | 2.3% |
EntryTime
Categorical
| Distinct | 27227 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 66.2 MiB |
| 30.12.1899 6:00:00 | 505282 |
|---|---|
| 30.12.1899 7:00:00 | 169117 |
| 30.12.1899 6:30:00 | 146460 |
| 30.12.1899 7:30:00 | 105216 |
| 30.12.1899 8:00:00 | 80819 |
| Other values (27222) |
Length
| Max length | 19 |
|---|---|
| Median length | 18 |
| Mean length | 18.126613 |
| Min length | 18 |
Characters and Unicode
| Total characters | 157326278 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 3 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 12782 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | 30.12.1899 7:42:00 |
|---|---|
| 2nd row | 30.12.1899 7:42:00 |
| 3rd row | 30.12.1899 7:31:00 |
| 4th row | 30.12.1899 7:31:00 |
| 5th row | 30.12.1899 7:31:00 |
Common Values
| Value | Count | Frequency (%) |
| 30.12.1899 6:00:00 | 505282 | 5.8% |
| 30.12.1899 7:00:00 | 169117 | 1.9% |
| 30.12.1899 6:30:00 | 146460 | 1.7% |
| 30.12.1899 7:30:00 | 105216 | 1.2% |
| 30.12.1899 8:00:00 | 80819 | 0.9% |
| 30.12.1899 10:00:00 | 80726 | 0.9% |
| 30.12.1899 7:15:00 | 77290 | 0.9% |
| 30.12.1899 6:36:00 | 75669 | 0.9% |
| 30.12.1899 7:20:00 | 75477 | 0.9% |
| 30.12.1899 6:37:00 | 75223 | 0.9% |
| Other values (27217) | 7288019 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 30.12.1899 | 8679298 | |
| 6:00:00 | 505282 | 2.9% |
| 7:00:00 | 169117 | 1.0% |
| 6:30:00 | 146460 | 0.8% |
| 7:30:00 | 105216 | 0.6% |
| 8:00:00 | 80819 | 0.5% |
| 10:00:00 | 80726 | 0.5% |
| 7:15:00 | 77290 | 0.4% |
| 6:36:00 | 75669 | 0.4% |
| 7:20:00 | 75477 | 0.4% |
| Other values (27218) | 7363242 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 30232683 | |
| 1 | 20535435 | |
| 9 | 18477092 | |
| . | 17358596 | |
| : | 17358596 | |
| 3 | 11384724 | 7.2% |
| 2 | 10814043 | 6.9% |
| 8 | 10284590 | 6.5% |
| 8679298 | 5.5% | |
| 7 | 4398647 | 2.8% |
| Other values (3) | 7802574 | 5.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 113929788 | |
| Other Punctuation | 34717192 | 22.1% |
| Space Separator | 8679298 | 5.5% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 30232683 | |
| 1 | 20535435 | |
| 9 | 18477092 | |
| 3 | 11384724 | 10.0% |
| 2 | 10814043 | 9.5% |
| 8 | 10284590 | 9.0% |
| 7 | 4398647 | 3.9% |
| 6 | 3472949 | 3.0% |
| 4 | 2204150 | 1.9% |
| 5 | 2125475 | 1.9% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 17358596 | |
| : | 17358596 |
Space Separator
| Value | Count | Frequency (%) |
| 8679298 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 157326278 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 30232683 | |
| 1 | 20535435 | |
| 9 | 18477092 | |
| . | 17358596 | |
| : | 17358596 | |
| 3 | 11384724 | 7.2% |
| 2 | 10814043 | 6.9% |
| 8 | 10284590 | 6.5% |
| 8679298 | 5.5% | |
| 7 | 4398647 | 2.8% |
| Other values (3) | 7802574 | 5.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 157326278 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 30232683 | |
| 1 | 20535435 | |
| 9 | 18477092 | |
| . | 17358596 | |
| : | 17358596 | |
| 3 | 11384724 | 7.2% |
| 2 | 10814043 | 6.9% |
| 8 | 10284590 | 6.5% |
| 8679298 | 5.5% | |
| 7 | 4398647 | 2.8% |
| Other values (3) | 7802574 | 5.0% |
| Distinct | 2025 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 8106112 |
| Missing (%) | 93.4% |
| Memory size | 66.2 MiB |
| 41366 | |
|---|---|
| 20045 | |
| 20044 | |
| 40669 | |
| 40668 | |
| Other values (2020) |
Length
| Max length | 34 |
|---|---|
| Median length | 5 |
| Mean length | 6.6410031 |
| Min length | 3 |
Characters and Unicode
| Total characters | 3806530 |
|---|---|
| Distinct characters | 52 |
| Distinct categories | 7 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 498 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | SZU-HEP_0004__R |
|---|---|
| 2nd row | ZU-OKM_0024__R |
| 3rd row | 20411 |
| 4th row | 20042 |
| 5th row | 20411 |
Common Values
| Value | Count | Frequency (%) |
| 41366 | 77143 | 0.9% |
| 20045 | 55042 | 0.6% |
| 20044 | 49730 | 0.6% |
| 40669 | 49714 | 0.6% |
| 40668 | 49712 | 0.6% |
| 20411 | 28244 | 0.3% |
| 20042 | 27766 | 0.3% |
| 8094 | 22631 | 0.3% |
| 9520 | 12429 | 0.1% |
| 9524 | 12429 | 0.1% |
| Other values (2015) | 188346 | 2.2% |
| (Missing) | 8106112 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 41366 | 77143 | 13.3% |
| 20045 | 55042 | 9.5% |
| 20044 | 49730 | 8.6% |
| 40669 | 49714 | 8.6% |
| 40668 | 49712 | 8.6% |
| 20411 | 28244 | 4.9% |
| 20042 | 27766 | 4.8% |
| 8094 | 22631 | 3.9% |
| 9520 | 12429 | 2.2% |
| 9524 | 12429 | 2.2% |
| Other values (2044) | 193136 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 700804 | |
| 4 | 480708 | |
| 6 | 378511 | |
| 2 | 337420 | |
| _ | 264513 | 6.9% |
| 1 | 247718 | 6.5% |
| M | 174898 | 4.6% |
| 9 | 162389 | 4.3% |
| 5 | 143039 | 3.8% |
| 3 | 139469 | 3.7% |
| Other values (42) | 777061 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2717681 | |
| Uppercase Letter | 718544 | 18.9% |
| Connector Punctuation | 264513 | 6.9% |
| Dash Punctuation | 87989 | 2.3% |
| Lowercase Letter | 10035 | 0.3% |
| Space Separator | 4790 | 0.1% |
| Other Punctuation | 2978 | 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 174898 | |
| L | 89636 | |
| I | 88036 | |
| E | 87777 | |
| P | 87758 | |
| K | 87703 | |
| R | 60557 | 8.4% |
| S | 27132 | 3.8% |
| C | 3728 | 0.5% |
| H | 3227 | 0.4% |
| Other values (15) | 8092 | 1.1% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 2242 | |
| e | 1916 | |
| s | 958 | |
| r | 958 | |
| i | 958 | |
| o | 958 | |
| g | 958 | |
| h | 958 | |
| w | 103 | 1.0% |
| c | 23 | 0.2% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 700804 | |
| 4 | 480708 | |
| 6 | 378511 | |
| 2 | 337420 | |
| 1 | 247718 | 9.1% |
| 9 | 162389 | 6.0% |
| 5 | 143039 | 5.3% |
| 3 | 139469 | 5.1% |
| 8 | 103770 | 3.8% |
| 7 | 23853 | 0.9% |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 1312 | |
| * | 1312 | |
| , | 354 | 11.9% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 264513 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 87989 |
Space Separator
| Value | Count | Frequency (%) |
| 4790 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3077951 | |
| Latin | 728579 | 19.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| M | 174898 | |
| L | 89636 | |
| I | 88036 | |
| E | 87777 | |
| P | 87758 | |
| K | 87703 | |
| R | 60557 | 8.3% |
| S | 27132 | 3.7% |
| C | 3728 | 0.5% |
| H | 3227 | 0.4% |
| Other values (26) | 18127 | 2.5% |
Common
| Value | Count | Frequency (%) |
| 0 | 700804 | |
| 4 | 480708 | |
| 6 | 378511 | |
| 2 | 337420 | |
| _ | 264513 | 8.6% |
| 1 | 247718 | 8.0% |
| 9 | 162389 | 5.3% |
| 5 | 143039 | 4.6% |
| 3 | 139469 | 4.5% |
| 8 | 103770 | 3.4% |
| Other values (6) | 119610 | 3.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3806530 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 700804 | |
| 4 | 480708 | |
| 6 | 378511 | |
| 2 | 337420 | |
| _ | 264513 | 6.9% |
| 1 | 247718 | 6.5% |
| M | 174898 | 4.6% |
| 9 | 162389 | 4.3% |
| 5 | 143039 | 3.8% |
| 3 | 139469 | 3.7% |
| Other values (42) | 777061 |
NCLP
Real number (ℝ)
| Distinct | 1619 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 573524 |
| Missing (%) | 6.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7552.2215 |
| Minimum | 53 |
|---|---|
| Maximum | 53191 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.2 MiB |
Quantile statistics
| Minimum | 53 |
|---|---|
| 5-th percentile | 921 |
| Q1 | 3078 |
| median | 4769 |
| Q3 | 12460 |
| 95-th percentile | 17339 |
| Maximum | 53191 |
| Range | 53138 |
| Interquartile range (IQR) | 9382 |
Descriptive statistics
| Standard deviation | 6188.5202 |
|---|---|
| Coefficient of variation (CV) | 0.81943044 |
| Kurtosis | 5.7807276 |
| Mean | 7552.2215 |
| Median Absolute Deviation (MAD) | 3419 |
| Skewness | 1.4040591 |
| Sum | 6.1216601 × 1010 |
| Variance | 38297782 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 8574 | 144122 | 1.7% |
| 1675 | 134565 | 1.6% |
| 2688 | 134542 | 1.6% |
| 1991 | 134540 | 1.6% |
| 13808 | 134536 | 1.6% |
| 2099 | 134488 | 1.5% |
| 2419 | 134484 | 1.5% |
| 4726 | 134482 | 1.5% |
| 4769 | 134480 | 1.5% |
| 16263 | 134308 | 1.5% |
| Other values (1609) | 6751227 | |
| (Missing) | 573524 | 6.6% |
| Value | Count | Frequency (%) |
| 53 | 147 | < 0.1% |
| 54 | 145 | < 0.1% |
| 80 | 1546 | |
| 82 | 871 | |
| 88 | 1 | < 0.1% |
| 116 | 1546 | |
| 118 | 871 | |
| 124 | 1 | < 0.1% |
| 149 | 1 | < 0.1% |
| 159 | 908 |
| Value | Count | Frequency (%) |
| 53191 | 21 | < 0.1% |
| 52685 | 1409 | < 0.1% |
| 52684 | 151 | < 0.1% |
| 52683 | 1 | < 0.1% |
| 52681 | 7807 | |
| 52680 | 1671 | < 0.1% |
| 52674 | 7 | < 0.1% |
| 52673 | 32 | < 0.1% |
| 52672 | 3632 | |
| 52671 | 267 | < 0.1% |
Analyte
Categorical
| Distinct | 3479 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1923 |
| Missing (%) | < 0.1% |
| Memory size | 66.2 MiB |
| s_kreatinin | 144100 |
|---|---|
| B_Hemoglobin | 134480 |
| B_Erytrocyty | 134479 |
| B_Hematokrit | 134479 |
| B_Trombocyty | 134479 |
| Other values (3474) |
Length
| Max length | 67 |
|---|---|
| Median length | 51 |
| Mean length | 10.656878 |
| Min length | 1 |
Characters and Unicode
| Total characters | 92473723 |
|---|---|
| Distinct characters | 110 |
| Distinct categories | 11 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 4 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 920 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | HBsAg konfirmace |
|---|---|
| 2nd row | Rotaviry + Noroviry stolice |
| 3rd row | s_vápník celk. |
| 4th row | s_fosfor |
| 5th row | s_hořčík |
Common Values
| Value | Count | Frequency (%) |
| s_kreatinin | 144100 | 1.7% |
| B_Hemoglobin | 134480 | 1.5% |
| B_Erytrocyty | 134479 | 1.5% |
| B_Hematokrit | 134479 | 1.5% |
| B_Trombocyty | 134479 | 1.5% |
| B_Leukocyty | 134479 | 1.5% |
| B_MCV | 134477 | 1.5% |
| B_MCH | 134477 | 1.5% |
| B_MCHC | 134477 | 1.5% |
| B_RDW | 134476 | 1.5% |
| Other values (3469) | 7322972 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| cholest | 311720 | 2.9% |
| celk | 272070 | 2.5% |
| b_thr | 270278 | 2.5% |
| uf | 249467 | 2.3% |
| u_epitelie | 175452 | 1.6% |
| u_leukocyty | 151861 | 1.4% |
| u_válce | 151817 | 1.4% |
| s_kreatinin | 144132 | 1.3% |
| objem | 139587 | 1.3% |
| b_leukocyty | 136424 | 1.3% |
| Other values (3063) | 8889437 |
Most occurring characters
| Value | Count | Frequency (%) |
| _ | 8945969 | 9.7% |
| o | 5253273 | 5.7% |
| s | 4641648 | 5.0% |
| e | 4550671 | 4.9% |
| t | 4061586 | 4.4% |
| l | 4008082 | 4.3% |
| r | 3925049 | 4.2% |
| i | 3914180 | 4.2% |
| y | 3519438 | 3.8% |
| a | 3510497 | 3.8% |
| Other values (100) | 46143330 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 59839356 | |
| Uppercase Letter | 18266690 | 19.8% |
| Connector Punctuation | 8945969 | 9.7% |
| Space Separator | 2233708 | 2.4% |
| Other Punctuation | 1487831 | 1.6% |
| Decimal Number | 803795 | 0.9% |
| Dash Punctuation | 521520 | 0.6% |
| Math Symbol | 142496 | 0.2% |
| Open Punctuation | 113902 | 0.1% |
| Close Punctuation | 113887 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| o | 5253273 | 8.8% |
| s | 4641648 | 7.8% |
| e | 4550671 | 7.6% |
| t | 4061586 | 6.8% |
| l | 4008082 | 6.7% |
| r | 3925049 | 6.6% |
| i | 3914180 | 6.5% |
| y | 3519438 | 5.9% |
| a | 3510497 | 5.9% |
| n | 3279090 | 5.5% |
| Other values (33) | 19175842 |
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 3229915 | |
| U | 1947634 | |
| T | 1914675 | |
| H | 1240246 | 6.8% |
| L | 1115814 | 6.1% |
| C | 1113063 | 6.1% |
| D | 1038222 | 5.7% |
| A | 960739 | 5.3% |
| P | 837597 | 4.6% |
| M | 706294 | 3.9% |
| Other values (23) | 4162491 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 1384974 | |
| % | 62467 | 4.2% |
| / | 34598 | 2.3% |
| , | 3441 | 0.2% |
| : | 1447 | 0.1% |
| ; | 579 | < 0.1% |
| * | 254 | < 0.1% |
| # | 66 | < 0.1% |
| ? | 3 | < 0.1% |
| ! | 2 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 377494 | |
| 2 | 214601 | |
| 3 | 82471 | 10.3% |
| 4 | 58623 | 7.3% |
| 9 | 27872 | 3.5% |
| 5 | 14211 | 1.8% |
| 0 | 10738 | 1.3% |
| 8 | 7350 | 0.9% |
| 6 | 6765 | 0.8% |
| 7 | 3670 | 0.5% |
Math Symbol
| Value | Count | Frequency (%) |
| > | 134381 | |
| + | 8111 | 5.7% |
| = | 4 | < 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 2233702 | ||
| 6 | < 0.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 521512 | |
| – | 8 | < 0.1% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 113016 | |
| [ | 886 | 0.8% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 113007 | |
| ] | 880 | 0.8% |
Other Symbol
| Value | Count | Frequency (%) |
| ™ | 4557 | |
| ° | 12 | 0.3% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 8945969 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 78106046 | |
| Common | 14367677 | 15.5% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| o | 5253273 | 6.7% |
| s | 4641648 | 5.9% |
| e | 4550671 | 5.8% |
| t | 4061586 | 5.2% |
| l | 4008082 | 5.1% |
| r | 3925049 | 5.0% |
| i | 3914180 | 5.0% |
| y | 3519438 | 4.5% |
| a | 3510497 | 4.5% |
| n | 3279090 | 4.2% |
| Other values (66) | 37442532 |
Common
| Value | Count | Frequency (%) |
| _ | 8945969 | |
| 2233702 | 15.5% | |
| . | 1384974 | 9.6% |
| - | 521512 | 3.6% |
| 1 | 377494 | 2.6% |
| 2 | 214601 | 1.5% |
| > | 134381 | 0.9% |
| ( | 113016 | 0.8% |
| ) | 113007 | 0.8% |
| 3 | 82471 | 0.6% |
| Other values (24) | 246550 | 1.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 90120659 | |
| None | 2348499 | 2.5% |
| Letterlike Symbols | 4557 | < 0.1% |
| Punctuation | 8 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| _ | 8945969 | 9.9% |
| o | 5253273 | 5.8% |
| s | 4641648 | 5.2% |
| e | 4550671 | 5.0% |
| t | 4061586 | 4.5% |
| l | 4008082 | 4.4% |
| r | 3925049 | 4.4% |
| i | 3914180 | 4.3% |
| y | 3519438 | 3.9% |
| a | 3510497 | 3.9% |
| Other values (72) | 43790266 |
None
| Value | Count | Frequency (%) |
| í | 659737 | |
| á | 573573 | |
| č | 279755 | |
| é | 262785 | 11.2% |
| ó | 251870 | 10.7% |
| ř | 104270 | 4.4% |
| ž | 94351 | 4.0% |
| ý | 51167 | 2.2% |
| š | 30613 | 1.3% |
| ů | 26985 | 1.1% |
| Other values (16) | 13393 | 0.6% |
Letterlike Symbols
| Value | Count | Frequency (%) |
| ™ | 4557 |
Punctuation
| Value | Count | Frequency (%) |
| – | 8 |
| Distinct | 23084 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 1338881 |
| Missing (%) | 15.4% |
| Memory size | 66.2 MiB |
| 0 | 435532 |
|---|---|
| 1 | 68671 |
| 2 | 50246 |
| 5 | 44935 |
| 5.5 | 41213 |
| Other values (23079) |
Length
| Max length | 8 |
|---|---|
| Median length | 7 |
| Mean length | 3.3814182 |
| Min length | 1 |
Characters and Unicode
| Total characters | 24821020 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 4 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 8522 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | 2.37 |
|---|---|
| 2nd row | 1.32 |
| 3rd row | 0.75 |
| 4th row | 138.9 |
| 5th row | 10.36 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 435532 | 5.0% |
| 1 | 68671 | 0.8% |
| 2 | 50246 | 0.6% |
| 5 | 44935 | 0.5% |
| 5.5 | 41213 | 0.5% |
| 3 | 40192 | 0.5% |
| 0.3 | 39812 | 0.5% |
| 0.4 | 38905 | 0.4% |
| 0.02 | 36362 | 0.4% |
| 0.2 | 35349 | 0.4% |
| Other values (23074) | 6509200 | |
| (Missing) | 1338881 | 15.4% |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 0 | 435532 | 5.9% |
| 1 | 68935 | 0.9% |
| 2 | 50451 | 0.7% |
| 5 | 45024 | 0.6% |
| 5.5 | 41304 | 0.6% |
| 3 | 40390 | 0.5% |
| 0.3 | 40100 | 0.5% |
| 0.4 | 39206 | 0.5% |
| 0.02 | 36362 | 0.5% |
| 0.2 | 35645 | 0.5% |
| Other values (22863) | 6519674 |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 5542547 | |
| 1 | 3388325 | |
| 0 | 2682178 | |
| 3 | 2290612 | |
| 2 | 2112167 | 8.5% |
| 4 | 2019656 | 8.1% |
| 5 | 1609675 | 6.5% |
| 6 | 1345696 | 5.4% |
| 7 | 1277717 | 5.1% |
| 8 | 1268324 | 5.1% |
| Other values (3) | 1284123 | 5.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 19254061 | |
| Other Punctuation | 5542547 | 22.3% |
| Space Separator | 12206 | < 0.1% |
| Dash Punctuation | 12206 | < 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 3388325 | |
| 0 | 2682178 | |
| 3 | 2290612 | |
| 2 | 2112167 | |
| 4 | 2019656 | |
| 5 | 1609675 | |
| 6 | 1345696 | 7.0% |
| 7 | 1277717 | 6.6% |
| 8 | 1268324 | 6.6% |
| 9 | 1259711 | 6.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 5542547 |
Space Separator
| Value | Count | Frequency (%) |
| 12206 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 12206 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 24821020 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 5542547 | |
| 1 | 3388325 | |
| 0 | 2682178 | |
| 3 | 2290612 | |
| 2 | 2112167 | 8.5% |
| 4 | 2019656 | 8.1% |
| 5 | 1609675 | 6.5% |
| 6 | 1345696 | 5.4% |
| 7 | 1277717 | 5.1% |
| 8 | 1268324 | 5.1% |
| Other values (3) | 1284123 | 5.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 24821020 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 5542547 | |
| 1 | 3388325 | |
| 0 | 2682178 | |
| 3 | 2290612 | |
| 2 | 2112167 | 8.5% |
| 4 | 2019656 | 8.1% |
| 5 | 1609675 | 6.5% |
| 6 | 1345696 | 5.4% |
| 7 | 1277717 | 5.1% |
| 8 | 1268324 | 5.1% |
| Other values (3) | 1284123 | 5.2% |
| Distinct | 32847 |
|---|---|
| Distinct (%) | 2.5% |
| Missing | 7340434 |
| Missing (%) | 84.6% |
| Memory size | 66.2 MiB |
| negativní | |
|---|---|
| normalní nález | |
| normal | |
| ojediněle | |
| normální nález | |
| Other values (32842) |
Length
| Max length | 1920 |
|---|---|
| Median length | 1807 |
| Mean length | 17.411001 |
| Min length | 1 |
Characters and Unicode
| Total characters | 23310962 |
|---|---|
| Distinct characters | 134 |
| Distinct categories | 13 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 3 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 26196 ? |
|---|---|
| Unique (%) | 2.0% |
Sample
| 1st row | externě |
|---|---|
| 2nd row | odběr |
| 3rd row | normal |
| 4th row | negativní |
| 5th row | negativní |
Common Values
| Value | Count | Frequency (%) |
| negativní | 485411 | 5.6% |
| normalní nález | 138384 | 1.6% |
| normal | 105600 | 1.2% |
| ojediněle | 65120 | 0.8% |
| normální nález | 60485 | 0.7% |
| ordinace | 50572 | 0.6% |
| nelze hodnotit | 37990 | 0.4% |
| odběr | 27114 | 0.3% |
| Není známo, zda pacient užívá Warfarin (kumariny) | 20592 | 0.2% |
| POZOR!!! Změna principu stanovení močového sedimentu (průtoková cytometrie). | 18463 | 0.2% |
| Other values (32837) | 329133 | 3.8% |
| (Missing) | 7340434 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| negativní | 488157 | 16.4% |
| nález | 199216 | 6.7% |
| normalní | 138384 | 4.7% |
| normal | 105650 | 3.6% |
| pacient | 71962 | 2.4% |
| ojediněle | 65372 | 2.2% |
| 63172 | 2.1% | |
| užívá | 62016 | 2.1% |
| normální | 60644 | 2.0% |
| ordinace | 50576 | 1.7% |
| Other values (32105) | 1669227 |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 2792801 | 12.0% |
| 1724095 | 7.4% | |
| e | 1683160 | 7.2% |
| a | 1485544 | 6.4% |
| o | 1331305 | 5.7% |
| i | 1186964 | 5.1% |
| t | 1032153 | 4.4% |
| í | 932939 | 4.0% |
| r | 895491 | 3.8% |
| v | 850125 | 3.6% |
| Other values (124) | 9396385 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 18388306 | |
| Space Separator | 1724095 | 7.4% |
| Uppercase Letter | 1175047 | 5.0% |
| Decimal Number | 706778 | 3.0% |
| Other Punctuation | 597388 | 2.6% |
| Control | 225098 | 1.0% |
| Dash Punctuation | 201138 | 0.9% |
| Math Symbol | 123628 | 0.5% |
| Open Punctuation | 77318 | 0.3% |
| Close Punctuation | 77257 | 0.3% |
| Other values (3) | 14909 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 2792801 | |
| e | 1683160 | 9.2% |
| a | 1485544 | 8.1% |
| o | 1331305 | 7.2% |
| i | 1186964 | 6.5% |
| t | 1032153 | 5.6% |
| í | 932939 | 5.1% |
| r | 895491 | 4.9% |
| v | 850125 | 4.6% |
| l | 809443 | 4.4% |
| Other values (35) | 5388381 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 116360 | 9.9% |
| O | 101828 | 8.7% |
| N | 91753 | 7.8% |
| A | 63780 | 5.4% |
| R | 63176 | 5.4% |
| E | 55901 | 4.8% |
| L | 54974 | 4.7% |
| M | 54278 | 4.6% |
| T | 53950 | 4.6% |
| Z | 53932 | 4.6% |
| Other values (34) | 465115 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 269783 | |
| , | 111861 | |
| : | 110138 | |
| ! | 72319 | 12.1% |
| / | 19252 | 3.2% |
| * | 4323 | 0.7% |
| % | 2824 | 0.5% |
| ' | 2313 | 0.4% |
| @ | 2071 | 0.3% |
| " | 1171 | 0.2% |
| Other values (4) | 1333 | 0.2% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 185691 | |
| 1 | 136417 | |
| 2 | 126011 | |
| 3 | 53289 | 7.5% |
| 5 | 49627 | 7.0% |
| 4 | 45369 | 6.4% |
| 8 | 31105 | 4.4% |
| 6 | 29922 | 4.2% |
| 9 | 25107 | 3.6% |
| 7 | 24240 | 3.4% |
Math Symbol
| Value | Count | Frequency (%) |
| + | 79362 | |
| < | 37102 | |
| > | 4171 | 3.4% |
| = | 2370 | 1.9% |
| | | 623 | 0.5% |
Control
| Value | Count | Frequency (%) |
| 150058 | ||
| 75038 | ||
| 2 | < 0.1% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 77293 | |
| „ | 21 | < 0.1% |
| [ | 4 | < 0.1% |
Modifier Symbol
| Value | Count | Frequency (%) |
| ´ | 61 | |
| ` | 42 | |
| ¨ | 4 | 3.7% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 201137 | |
| – | 1 | < 0.1% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 77253 | |
| ] | 4 | < 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 1724095 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 14717 |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 85 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 19563176 | |
| Common | 3747786 | 16.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 2792801 | |
| e | 1683160 | 8.6% |
| a | 1485544 | 7.6% |
| o | 1331305 | 6.8% |
| i | 1186964 | 6.1% |
| t | 1032153 | 5.3% |
| í | 932939 | 4.8% |
| r | 895491 | 4.6% |
| v | 850125 | 4.3% |
| l | 809443 | 4.1% |
| Other values (78) | 6563251 |
Common
| Value | Count | Frequency (%) |
| 1724095 | ||
| . | 269783 | 7.2% |
| - | 201137 | 5.4% |
| 0 | 185691 | 5.0% |
| 150058 | 4.0% | |
| 1 | 136417 | 3.6% |
| 2 | 126011 | 3.4% |
| , | 111861 | 3.0% |
| : | 110138 | 2.9% |
| + | 79362 | 2.1% |
| Other values (36) | 653233 | 17.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 21227794 | |
| None | 2083146 | 8.9% |
| Punctuation | 22 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| n | 2792801 | 13.2% |
| 1724095 | 8.1% | |
| e | 1683160 | 7.9% |
| a | 1485544 | 7.0% |
| o | 1331305 | 6.3% |
| i | 1186964 | 5.6% |
| t | 1032153 | 4.9% |
| r | 895491 | 4.2% |
| v | 850125 | 4.0% |
| l | 809443 | 3.8% |
| Other values (82) | 7436713 |
None
| Value | Count | Frequency (%) |
| í | 932939 | |
| á | 517862 | |
| ě | 177918 | 8.5% |
| ž | 105876 | 5.1% |
| é | 79957 | 3.8% |
| ř | 56075 | 2.7% |
| ý | 51172 | 2.5% |
| č | 50696 | 2.4% |
| ů | 36807 | 1.8% |
| š | 21368 | 1.0% |
| Other values (30) | 52476 | 2.5% |
Punctuation
| Value | Count | Frequency (%) |
| „ | 21 | |
| – | 1 | 4.5% |
| Distinct | 195 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1383102 |
| Missing (%) | 15.9% |
| Memory size | 66.2 MiB |
| mmol/l | |
|---|---|
| % | |
| x10^9/l | |
| fl | |
| g/l | |
| Other values (190) |
Length
| Max length | 22 |
|---|---|
| Median length | 15 |
| Mean length | 4.6795038 |
| Min length | 1 |
Characters and Unicode
| Total characters | 34142577 |
|---|---|
| Distinct characters | 70 |
| Distinct categories | 9 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 19 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | mmol/l |
|---|---|
| 2nd row | mmol/l |
| 3rd row | mmol/l |
| 4th row | umol/l |
| 5th row | mmol/l |
Common Values
| Value | Count | Frequency (%) |
| mmol/l | 1478479 | |
| % | 868045 | 10.0% |
| x10^9/l | 403422 | 4.6% |
| fl | 401148 | 4.6% |
| g/l | 397441 | 4.6% |
| ukat/l | 343569 | 4.0% |
| 10^9/l | 322382 | 3.7% |
| umol/l | 246669 | 2.8% |
| µkat/l | 236565 | 2.7% |
| /uL | 224902 | 2.6% |
| Other values (185) | 2373574 | |
| (Missing) | 1383102 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| mmol/l | 1501468 | |
| 974603 | ||
| x10^9/l | 403422 | 5.4% |
| fl | 401203 | 5.4% |
| g/l | 397527 | 5.3% |
| ukat/l | 343753 | 4.6% |
| 10^9/l | 322389 | 4.3% |
| ul | 269842 | 3.6% |
| umol/l | 246762 | 3.3% |
| µkat/l | 236597 | 3.2% |
| Other values (159) | 2355249 |
Most occurring characters
| Value | Count | Frequency (%) |
| l | 7578497 | |
| / | 5429095 | |
| m | 4615170 | |
| o | 2399525 | 7.0% |
| 1 | 1214359 | 3.6% |
| 0 | 922814 | 2.7% |
| u | 891088 | 2.6% |
| ^ | 872267 | 2.6% |
| % | 871027 | 2.6% |
| g | 833474 | 2.4% |
| Other values (60) | 8515261 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 21727754 | |
| Other Punctuation | 6795749 | 19.9% |
| Decimal Number | 3588693 | 10.5% |
| Uppercase Letter | 895158 | 2.6% |
| Modifier Symbol | 872267 | 2.6% |
| Space Separator | 156619 | 0.5% |
| Dash Punctuation | 106315 | 0.3% |
| Math Symbol | 14 | < 0.1% |
| Connector Punctuation | 8 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| l | 7578497 | |
| m | 4615170 | |
| o | 2399525 | 11.0% |
| u | 891088 | 4.1% |
| g | 833474 | 3.8% |
| k | 794400 | 3.7% |
| a | 748953 | 3.4% |
| t | 614977 | 2.8% |
| µ | 608736 | 2.8% |
| x | 537592 | 2.5% |
| Other values (19) | 2105342 | 9.7% |
Uppercase Letter
| Value | Count | Frequency (%) |
| L | 426693 | |
| I | 152789 | 17.1% |
| U | 143961 | 16.1% |
| P | 60302 | 6.7% |
| R | 44442 | 5.0% |
| N | 41656 | 4.7% |
| E | 5741 | 0.6% |
| Y | 5270 | 0.6% |
| A | 4148 | 0.5% |
| F | 3327 | 0.4% |
| Other values (9) | 6829 | 0.8% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 1214359 | |
| 0 | 922814 | |
| 9 | 775573 | |
| 2 | 287393 | 8.0% |
| 3 | 223788 | 6.2% |
| 7 | 147876 | 4.1% |
| 6 | 12125 | 0.3% |
| 4 | 4747 | 0.1% |
| 5 | 18 | < 0.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 5429095 | |
| % | 871027 | 12.8% |
| . | 347724 | 5.1% |
| , | 147872 | 2.2% |
| * | 29 | < 0.1% |
| " | 2 | < 0.1% |
Math Symbol
| Value | Count | Frequency (%) |
| < | 6 | |
| > | 6 | |
| + | 2 | 14.3% |
Modifier Symbol
| Value | Count | Frequency (%) |
| ^ | 872267 |
Space Separator
| Value | Count | Frequency (%) |
| 156619 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 106315 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 8 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 22014176 | |
| Common | 12128401 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| l | 7578497 | |
| m | 4615170 | |
| o | 2399525 | 10.9% |
| u | 891088 | 4.0% |
| g | 833474 | 3.8% |
| k | 794400 | 3.6% |
| a | 748953 | 3.4% |
| t | 614977 | 2.8% |
| x | 537592 | 2.4% |
| L | 426693 | 1.9% |
| Other values (37) | 2573807 | 11.7% |
Common
| Value | Count | Frequency (%) |
| / | 5429095 | |
| 1 | 1214359 | 10.0% |
| 0 | 922814 | 7.6% |
| ^ | 872267 | 7.2% |
| % | 871027 | 7.2% |
| 9 | 775573 | 6.4% |
| µ | 608736 | 5.0% |
| . | 347724 | 2.9% |
| 2 | 287393 | 2.4% |
| 3 | 223788 | 1.8% |
| Other values (13) | 575625 | 4.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 33488291 | |
| None | 654286 | 1.9% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| l | 7578497 | |
| / | 5429095 | |
| m | 4615170 | |
| o | 2399525 | 7.2% |
| 1 | 1214359 | 3.6% |
| 0 | 922814 | 2.8% |
| u | 891088 | 2.7% |
| ^ | 872267 | 2.6% |
| % | 871027 | 2.6% |
| g | 833474 | 2.5% |
| Other values (55) | 7860975 |
None
| Value | Count | Frequency (%) |
| µ | 608736 | |
| č | 43054 | 6.6% |
| í | 2382 | 0.4% |
| Č | 107 | < 0.1% |
| ě | 7 | < 0.1% |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
| Patient | Report | ID | EntryDate | EntryTime | Code | NCLP | Analyte | ValueNumber | ValueText | RefHigh | RefLow | Unit | NLCP_E | NCLP_E | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 324729 | 27200171 | 154214712 | 30.09.2022 | 30.12.1899 7:42:00 | SZU-HEP_0004__R | NaN | HBsAg konfirmace | NaN | externě | NaN | NaN | NaN | ||
| 1 | 324729 | 27200171 | 154214720 | 30.09.2022 | 30.12.1899 7:42:00 | ZU-OKM_0024__R | NaN | Rotaviry + Noroviry stolice | NaN | odběr | NaN | NaN | NaN | ||
| 2 | 324729 | 1453155 | 18244355 | 16.04.2015 | 30.12.1899 7:31:00 | NaN | 3482.0 | s_vápník celk. | 2.37 | NaN | 2.55 | 2.15 | mmol/l | 52672.0 | 52672.0 |
| 3 | 324729 | 1453155 | 18244356 | 16.04.2015 | 30.12.1899 7:31:00 | NaN | 2618.0 | s_fosfor | 1.32 | NaN | 1.23 | 0.71 | mmol/l | 2618.0 | 2618.0 |
| 4 | 324729 | 1453155 | 18244357 | 16.04.2015 | 30.12.1899 7:31:00 | NaN | 3940.0 | s_hořčík | 0.75 | NaN | 0.94 | 0.71 | mmol/l | 3940.0 | 3940.0 |
| 5 | 324729 | 1453155 | 18244358 | 16.04.2015 | 30.12.1899 7:31:00 | NaN | 8574.0 | s_kreatinin | 138.9 | NaN | 104.0 | 64.0 | umol/l | 8574.0 | 8574.0 |
| 6 | 324729 | 1453155 | 18244359 | 16.04.2015 | 30.12.1899 7:31:00 | NaN | 1896.0 | p_glukóza (NaF) | 10.36 | NaN | 5.59 | 3.60 | mmol/l | 51802.0 | 51802.0 |
| 7 | 324729 | 1453155 | 18244360 | 16.04.2015 | 30.12.1899 7:31:00 | NaN | 1350.0 | s_cholesterol | 3.5 | NaN | 5.0 | 2.9 | mmol/l | 1350.0 | 1350.0 |
| 8 | 324729 | 1453155 | 18244361 | 16.04.2015 | 30.12.1899 7:31:00 | NaN | 12374.0 | s_TAG | 1.77 | NaN | 1.69 | 0.50 | mmol/l | 12374.0 | 12374.0 |
| 9 | 324729 | 1453155 | 18244362 | 16.04.2015 | 30.12.1899 7:31:00 | NaN | 2036.0 | s_HDL cholest. | 0.93 | NaN | 2.1 | 1.00 | mmol/l | 2036.0 | 2036.0 |
| Patient | Report | ID | EntryDate | EntryTime | Code | NCLP | Analyte | ValueNumber | ValueText | RefHigh | RefLow | Unit | NLCP_E | NCLP_E | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8679288 | 335472 | 26490462 | 140942842 | 13.05.2022 | 30.12.1899 6:47:00 | NaN | 9189.0 | U_válce hyalinní UF | 3 | NaN | 2.0 | 0 | /µL | 9189.0 | 9189.0 |
| 8679289 | 335472 | 26490462 | 140942843 | 13.05.2022 | 30.12.1899 6:47:00 | NaN | 15160.0 | U_epitelie renální UF | NaN | normální nález | NaN | NaN | NaN | 15160.0 | 15160.0 |
| 8679290 | 335472 | 26490462 | 140942844 | 13.05.2022 | 30.12.1899 6:47:00 | NaN | 14013.0 | U_epitelie přechodné UF | 0 | NaN | 0.0 | 0 | /µL | 14013.0 | 14013.0 |
| 8679291 | 335472 | 26490462 | 140942845 | 13.05.2022 | 30.12.1899 6:47:00 | NaN | 14011.0 | U_epitelie dlaž.UF | 1 | NaN | 20.0 | 0 | /µL | 14011.0 | 14011.0 |
| 8679292 | 335472 | 26490462 | 140942846 | 13.05.2022 | 30.12.1899 6:47:00 | NaN | 3414.0 | U_protein celk. | NaN | negativní | NaN | NaN | NaN | 3414.0 | 3414.0 |
| 8679293 | 335472 | 26490462 | 140942847 | 13.05.2022 | 30.12.1899 6:47:00 | NaN | 3386.0 | U_shluky leukocytů UF | 0 | NaN | NaN | NaN | arb.j. | 3386.0 | 3386.0 |
| 8679294 | 335472 | 26490462 | 140942848 | 13.05.2022 | 30.12.1899 6:47:00 | NaN | 3280.0 | U_bilirubin | NaN | negativní | NaN | NaN | NaN | 3280.0 | 3280.0 |
| 8679295 | 335472 | 26490462 | 140942849 | 13.05.2022 | 30.12.1899 6:47:00 | NaN | 9322.0 | U_leukocyty UF | 5 | NaN | 10.0 | 0 | /µL | 9322.0 | 9322.0 |
| 8679296 | 335472 | 26490462 | 140942850 | 13.05.2022 | 30.12.1899 6:47:00 | NaN | 3078.0 | s_kys.močová | 327 | NaN | 420.0 | 210 | µmol/l | 3078.0 | 3078.0 |
| 8679297 | 335472 | 27409261 | 158447953 | 29.09.2022 | 30.12.1899 6:44:00 | NaN | 15194.0 | B_HbA1c | 53 | NaN | 42.0 | 20.0 | mmol/mol | 15194.0 | 15194.0 |